Chicago Burglaries

By Annivas Exarchos

Introduction

The overall objective of this project will be to analyze burglary data for Chicago, IL from 2015 to 2019.

Throughout this tutorial, we will attempt to find when and where burglaries are most likely to take place, while also complementing our analysis with interesting burglary trends and statistics.

Required Tools

This project is written in python 3.91.

You will need the following libraries:

Installations & Imports

In [69]:
!pip install folium # install to create maps
Requirement already satisfied: folium in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (0.11.0)
Requirement already satisfied: jinja2>=2.9 in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from folium) (2.11.2)
Requirement already satisfied: branca>=0.3.0 in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from folium) (0.4.1)
Requirement already satisfied: requests in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from folium) (2.24.0)
Requirement already satisfied: numpy in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from folium) (1.18.5)
Requirement already satisfied: MarkupSafe>=0.23 in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from jinja2>=2.9->folium) (1.1.1)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from requests->folium) (1.25.9)
Requirement already satisfied: certifi>=2017.4.17 in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from requests->folium) (2020.6.20)
Requirement already satisfied: idna<3,>=2.5 in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from requests->folium) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /Users/annivas/opt/anaconda3/lib/python3.8/site-packages (from requests->folium) (3.0.4)
In [70]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import folium
from sodapy import Socrata
from folium.plugins import HeatMap

1. Data Collection

This is the first stage of the data lifecycle. Here, we will collect all the data that we will need for our project.

The main dataset that we will be using contains all reported crimes in the city of Chicago since 2001 and can be found in the official Chicago Data Portal.

The data is stored in a large csv file, which we will be accessing using the sodapy client through the Socrata Open Data API.

From this file we will only extract crime data for the years 2015-2019.

In [71]:
# These can be found in the data portal
domain = 'data.cityofchicago.org'
dataset_id = 'ijzp-q8t2'

# Generate token by creating an account for the data portal
token = 'Lkysyak9elTtcNXRVmfsj9YLX'

client = Socrata(domain, token)

# Get data for 2015-2019
results = client.get(dataset_id, where="date >= '2015-01-01' and date < '2020-01-01'", limit=2000000)

# Store into pandas dataframe
crime_table = pd.DataFrame.from_dict(results)

# Display first 5 rows of dataframe
crime_table.head()
Out[71]:
id case_number date block iucr primary_type description location_description arrest domestic ... ward community_area fbi_code year updated_on x_coordinate y_coordinate latitude longitude location
0 11768614 JC361321 2015-01-01T00:00:00.000 030XX W 41ST ST 1751 OFFENSE INVOLVING CHILDREN CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER APARTMENT True True ... 12 58 17 2015 2020-11-26T15:46:07.000 NaN NaN NaN NaN NaN
1 12179862 JD383609 2015-01-01T00:00:00.000 064XX S CARPENTER ST 1754 OFFENSE INVOLVING CHILDREN AGGRAVATED SEXUAL ASSAULT OF CHILD BY FAMILY M... APARTMENT False False ... 16 68 02 2015 2020-09-30T15:52:35.000 NaN NaN NaN NaN NaN
2 12179870 JD383623 2015-01-01T00:00:00.000 052XX S WOOD ST 1754 OFFENSE INVOLVING CHILDREN AGGRAVATED SEXUAL ASSAULT OF CHILD BY FAMILY M... RESIDENCE False False ... 16 61 02 2015 2020-09-30T15:52:35.000 NaN NaN NaN NaN NaN
3 10135798 HY324366 2015-01-01T00:00:00.000 051XX W MONTANA ST 0266 CRIMINAL SEXUAL ASSAULT PREDATORY RESIDENCE False True ... 31 19 02 2015 2020-08-29T15:47:37.000 1141746 1915820 41.925078599 -87.754584486 {'latitude': '41.925078599', 'longitude': '-87...
4 12056114 JD238384 2015-01-01T00:00:00.000 062XX S INDIANA AVE 1752 OFFENSE INVOLVING CHILDREN AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER RESIDENCE True True ... 20 40 17 2015 2020-08-28T15:49:20.000 1185674 1871645 41.802933396 -87.594571086 {'latitude': '41.802933396', 'longitude': '-87...

5 rows × 22 columns

Additionally, we will be using some data downloaded from FBI's Crime Data Explorer in csv format. These burglary-specific datasets include statistics about victims' and offenders' age, sex, and race, as well the relationship between victims and offenders and other crimes that burglary offenders have been charged with.

To import this data into dataframes, we will be using pandas' read_csv method.

In [111]:
# Burglary offenders by age
offender_age = pd.read_csv("https://annivas.github.io/files/offender-age-2015-2019.csv")
offender_age
Out[111]:
Key Value
0 0-9 4041
1 10-19 182462
2 20-29 306174
3 30-39 231868
4 40-49 104887
5 50-59 69272
6 60-69 10986
7 70-79 1744
8 80-89 496
9 90-Older 2269
10 Unknown 546607
In [112]:
# Burglary offenders by sex
offender_sex = pd.read_csv("https://annivas.github.io/files/offender-sex-2015-2019.csv")
offender_sex
Out[112]:
Key Value Percent
0 Male 799186 0.548295
1 Female 185725 0.127420
2 Unknown 472674 0.324286
In [113]:
# Burglary offenders by race
offender_race = pd.read_csv("https://annivas.github.io/files/offender-race-2015-2019.csv")
offender_race
Out[113]:
Key Value
0 Asian 6421
1 Native Hawaiian 0
2 Black or African American 293696
3 American Indian or Alaska Native 10945
4 White 618786
5 Unknown 526479
In [114]:
# Burglary victims by age
victim_age = pd.read_csv("https://annivas.github.io/files/victim-age-2015-2019.csv")
victim_age
Out[114]:
Key Value
0 0-9 7600
1 10-19 70293
2 20-29 427706
3 30-39 442191
4 40-49 375548
5 50-59 370236
6 60-69 266728
7 70-79 130137
8 80-89 47165
9 90-Older 9419
10 Unknown 29590
In [115]:
# Burglary victims by sex
victim_sex = pd.read_csv("https://annivas.github.io/files/victim-sex-2015-2019.csv")
victim_sex
Out[115]:
Key Value Percent
0 Male 1164541 0.535024
1 Female 996356 0.457755
2 Unknown 15716 0.007220
In [116]:
# Burglary victims by race
victim_race = pd.read_csv("https://annivas.github.io/files/victim-race-2015-2019.csv")
victim_race
Out[116]:
Key Value
0 Asian 41819
1 Native Hawaiian 0
2 Black or African American 413438
3 American Indian or Alaska Native 10844
4 White 1607281
5 Unknown 99278
In [117]:
# Relationship between burglary offenders and victims
victim_offender_relationship = pd.read_csv("https://annivas.github.io/files/victim-offender-relationship-2015-2019.csv")
victim_offender_relationship
Out[117]:
Key Value
0 Acquaintance 19045
1 Babysittee 45
2 Boyfriend/Girlfriend 13371
3 Child of Boyfriend/Girlfriend 190
4 Child 454
5 Employee 134
6 Employer 216
7 Friend 2814
8 Grandchild 14
9 Grandparent 388
10 Homosexual Relationship 328
11 In-Law 500
12 Neighbor 2840
13 Other Family Member 2312
14 Otherwise Known 14171
15 Parent 2025
16 Relationship Unknown 51544
17 Sibling 1020
18 Stepchild 101
19 Spouse 1856
20 Stepparent 263
21 Stepsibling 49
22 Stranger 26995
23 Offender 478
24 Ex Spouse 2706
25 Common Law Spouse 249
In [118]:
# Other offenses linked to burglary offenders
linked_offenses = pd.read_csv("https://annivas.github.io/files/linked-offenses-2015-2019.csv")
linked_offenses
Out[118]:
Key Value
0 Identity Theft 1147
1 Fondling 982
2 Bribery 982
3 Stolen Property Offenses 12711
4 Pocket-Picking 340
5 Murder and Nonnegligent Manslaughter 344
6 Welfare Fraud 21
7 Extortion/Blackmail 109
8 Theft from Coin-operated Machine or Device 889
9 Statutory Rape 31
10 Intimidation 8189
11 Gambling Equipment Violation 2
12 Shoplifting 2697
13 Robbery 6808
14 Incest 0
15 Animal Cruelty 145
16 Destruction/Damage/Vandalism of Property 295271
17 Operating/Promoting/Assisting Gambling 6
18 Hacking/Computer Invasion 69
19 Sports Tampering 0
20 Kidnapping/Abduction 4240
21 Motor Vehicle Theft 29080
22 embezzlement 346
23 Impersonation 1743
24 Weapon Law Violations 7462
25 Negligent Manslaughter 3
26 Wire Fraud 71
27 Theft From Building 43694
28 Theft of Motor Vehicle Parts or Accessories 1915
29 False Pretenses/Swindle/Confidence game 4113
30 Drug Equipment Violations 8457
31 Sexual Assault with an Object 158
32 Counterfeiting/Forgery 2785
33 Purse-snatching 267
34 Prostitution 24
35 Credit Card/Automated Teller Machine Fraud 5315
36 Pornography/Obscene Material 79
37 Human Trafficking, Commercial Sex Acts 2
38 Betting/Wagering 0
39 Human Trafficking, Involuntary Servitude 0
40 Purchasing Prostitution 1
41 Theft from Motor Vehicle 20263
42 Drug/Narcotic Violations 15014
43 Aggravated Assault 17962
44 Simple Assault 31799
45 Assisting or Promoting Prostitution 12
46 Burglary/Breaking & Entering 0
47 Arson 2584
48 All Other Larceny 81300
49 Sodomy 253

2. Data Processing

Now that we have collected all the necessary data, it's time to process it and organize it in a way that will serve our needs for the remainder of the project.

First, let's extract all burglaries from the crime table into a new table. We will only choose the columns we need, as the initial crime table is filled with unnecessay information.

In [119]:
# Get burglaries from crime table (only the columns we need)
burglary_table = crime_table[['id', 'primary_type', 'description', 'arrest', 'location', 'latitude', 'longitude', 'date', 'year']].loc[crime_table['primary_type']=='BURGLARY']
# Display first 5 rows
burglary_table.head()
Out[119]:
id primary_type description arrest location latitude longitude date year
328 9913600 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.975141744', 'longitude': '-87... 41.975141744 -87.76454628 2015-01-01T00:01:00.000 2015
388 9911420 BURGLARY FORCIBLE ENTRY False {'latitude': '41.882806623', 'longitude': '-87... 41.882806623 -87.705030858 2015-01-01T01:00:00.000 2015
446 9911551 BURGLARY FORCIBLE ENTRY False {'latitude': '41.901003465', 'longitude': '-87... 41.901003465 -87.649282987 2015-01-01T02:00:00.000 2015
451 9914392 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.873236088', 'longitude': '-87... 41.873236088 -87.740923693 2015-01-01T02:00:00.000 2015
581 9911384 BURGLARY FORCIBLE ENTRY False {'latitude': '41.996215265', 'longitude': '-87... 41.996215265 -87.716862144 2015-01-01T04:50:00.000 2015

Now that we have only the data we need, let's add some new columns deriving from the "date" column

In [120]:
# Create month column. Months are represented as ints from 1 (January) to 12 (December).
# We could represent months as strings, but integers facilitate plotting.
burglary_table['month'] = pd.DatetimeIndex(burglary_table['date']).month
# Display first 5 rows
burglary_table.head()
Out[120]:
id primary_type description arrest location latitude longitude date year month
328 9913600 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.975141744', 'longitude': '-87... 41.975141744 -87.76454628 2015-01-01T00:01:00.000 2015 1
388 9911420 BURGLARY FORCIBLE ENTRY False {'latitude': '41.882806623', 'longitude': '-87... 41.882806623 -87.705030858 2015-01-01T01:00:00.000 2015 1
446 9911551 BURGLARY FORCIBLE ENTRY False {'latitude': '41.901003465', 'longitude': '-87... 41.901003465 -87.649282987 2015-01-01T02:00:00.000 2015 1
451 9914392 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.873236088', 'longitude': '-87... 41.873236088 -87.740923693 2015-01-01T02:00:00.000 2015 1
581 9911384 BURGLARY FORCIBLE ENTRY False {'latitude': '41.996215265', 'longitude': '-87... 41.996215265 -87.716862144 2015-01-01T04:50:00.000 2015 1
In [121]:
# Create day column. Days are represented as ints from 0 (Monday) to 6 (Sunday).
# We could represent days as strings, but integers facilitate plotting.
burglary_table['day'] = pd.DatetimeIndex(burglary_table['date']).weekday
# Display first 5 rows
burglary_table.head()
Out[121]:
id primary_type description arrest location latitude longitude date year month day
328 9913600 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.975141744', 'longitude': '-87... 41.975141744 -87.76454628 2015-01-01T00:01:00.000 2015 1 3
388 9911420 BURGLARY FORCIBLE ENTRY False {'latitude': '41.882806623', 'longitude': '-87... 41.882806623 -87.705030858 2015-01-01T01:00:00.000 2015 1 3
446 9911551 BURGLARY FORCIBLE ENTRY False {'latitude': '41.901003465', 'longitude': '-87... 41.901003465 -87.649282987 2015-01-01T02:00:00.000 2015 1 3
451 9914392 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.873236088', 'longitude': '-87... 41.873236088 -87.740923693 2015-01-01T02:00:00.000 2015 1 3
581 9911384 BURGLARY FORCIBLE ENTRY False {'latitude': '41.996215265', 'longitude': '-87... 41.996215265 -87.716862144 2015-01-01T04:50:00.000 2015 1 3
In [122]:
# Create time column. Time is expressed in hours and hours are represented as ints from 0 (12 am) to 23 (11 pm)
burglary_table['time'] = pd.DatetimeIndex(burglary_table['date']).hour
# Display first 5 rows
burglary_table.head()
Out[122]:
id primary_type description arrest location latitude longitude date year month day time
328 9913600 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.975141744', 'longitude': '-87... 41.975141744 -87.76454628 2015-01-01T00:01:00.000 2015 1 3 0
388 9911420 BURGLARY FORCIBLE ENTRY False {'latitude': '41.882806623', 'longitude': '-87... 41.882806623 -87.705030858 2015-01-01T01:00:00.000 2015 1 3 1
446 9911551 BURGLARY FORCIBLE ENTRY False {'latitude': '41.901003465', 'longitude': '-87... 41.901003465 -87.649282987 2015-01-01T02:00:00.000 2015 1 3 2
451 9914392 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.873236088', 'longitude': '-87... 41.873236088 -87.740923693 2015-01-01T02:00:00.000 2015 1 3 2
581 9911384 BURGLARY FORCIBLE ENTRY False {'latitude': '41.996215265', 'longitude': '-87... 41.996215265 -87.716862144 2015-01-01T04:50:00.000 2015 1 3 4

For the complementary data we imported, the only processing that needs to be done is setting the "Key" column as the index of each table and sorting the tables by "Value" to facilitate plotting.

In [123]:
offender_age = offender_age.set_index('Key').sort_values(by="Value", ascending=False)
offender_sex = offender_sex.set_index('Key').sort_values(by="Value", ascending=False)
offender_race = offender_race.set_index('Key').sort_values(by="Value", ascending=False)
victim_age = victim_age.set_index('Key').sort_values(by="Value", ascending=False)
victim_sex = victim_sex.set_index('Key').sort_values(by="Value", ascending=False)
victim_race = victim_race.set_index('Key').sort_values(by="Value", ascending=False)
victim_offender_relationship = victim_offender_relationship.set_index('Key').sort_values(by="Value", ascending=False)
linked_offenses = linked_offenses.set_index('Key').sort_values(by="Value", ascending=False)
In [ ]:
 
In [ ]:
 

3. Exploratory Data Analysis & Visualization

Now that our data is clean and organized, it's time to analyze it through the use of visualizations. This is usually the most interesting part of the data lifecycle, as we will attempt to plot our data and observe possible trends.

First, we will use the original crime table to measure the occurrences of each type of crime in the last 5 years.

In [124]:
# Caluculate number of each crime type occurrence in crime_table
crime_type_occ = crime_table['primary_type'].value_counts()
crime_type_occ
Out[124]:
THEFT                                311036
BATTERY                              247753
CRIMINAL DAMAGE                      143240
ASSAULT                               96110
DECEPTIVE PRACTICE                    93092
OTHER OFFENSE                         86117
NARCOTICS                             77579
BURGLARY                              61849
MOTOR VEHICLE THEFT                   51699
ROBBERY                               51149
CRIMINAL TRESPASS                     33247
WEAPONS VIOLATION                     23295
OFFENSE INVOLVING CHILDREN            11653
PUBLIC PEACE VIOLATION                 8419
CRIM SEXUAL ASSAULT                    6892
INTERFERENCE WITH PUBLIC OFFICER       6183
SEX OFFENSE                            5534
PROSTITUTION                           4255
HOMICIDE                               3068
ARSON                                  2161
LIQUOR LAW VIOLATION                   1210
CRIMINAL SEXUAL ASSAULT                1039
GAMBLING                               1033
STALKING                                953
KIDNAPPING                              926
INTIMIDATION                            741
CONCEALED CARRY LICENSE VIOLATION       505
OBSCENITY                               331
NON-CRIMINAL                            142
HUMAN TRAFFICKING                        60
PUBLIC INDECENCY                         59
OTHER NARCOTIC VIOLATION                 29
NON - CRIMINAL                           25
NON-CRIMINAL (SUBJECT SPECIFIED)          6
Name: primary_type, dtype: int64

From the above data, theft looks to be the most common crime in Chicago, while burglary is 8th.

Now, let's plot the 12 most common types of crime in a pie chart to get a better idea.

In [125]:
crime_type_occ[0:12].plot(kind='pie', figsize=(10, 10), title="Types of Crime", autopct='%1.1f%%')
plt.ylabel("")
plt.show()

By using the burglary_table, we can plot the number of burglaries by year and hopefully observe a trend.

In [126]:
burglary_table['year'].value_counts().sort_index().plot(kind='bar', rot=0, title="Burglaries by Year", figsize=(10, 8))
plt.ylabel("Number of Burglaries")
plt.show()

From the above bar plot, we can tell that in the last 5 years, 2016 had the most burglaries. Most importantly, there seems to be a decreasing trend since 2016, meaning that the number of burglaries has only decreased since then.

Now let's try to visualize burglaries by month. By counting the occurrences of each month in our burglary table, we can get the average number of burglaries occurred by month throughout the last 5 years.

In [127]:
# Total number of burglaries for 5 years, grouped by month
burglaries_by_month = burglary_table['month'].value_counts().sort_index()
# Divide value for each month by 5 to get normalized number of burglaries per month
burglaries_by_month = burglaries_by_month.apply(lambda x: x/5)
burglaries_by_month
Out[127]:
1     1053.8
2      777.2
3      865.0
4      895.4
5     1006.2
6     1030.6
7     1156.0
8     1178.2
9     1103.8
10    1135.8
11    1076.4
12    1091.4
Name: month, dtype: float64
In [128]:
figure(num=None, figsize=(14, 8))
x = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
y = burglaries_by_month
plt.bar(x,y)
plt.title("Burglaries by Month")
plt.ylabel("Number of Burglaries")
plt.show()

After plotting the average number of burglaries per month, we can start observing some trends. The most burglaries occur in August (a little less than 1200), followed by July, which could be attributed to the fact that many homes are left unoccupied during summer vacations. February seems to have the lowest average number of burglaries (about 2/3 of August's burglaries), meaning that households are the safest during that month of the year.

We can also plot the number of burglaries by day of the week.

In [129]:
# Total number of burglaries for 5 years, grouped by day of week
burglaries_by_day = burglary_table['day'].value_counts().sort_index()
# Divide value for each day by 5 to get number of burglaries for each year by day
burglaries_by_day = burglaries_by_day.apply(lambda x: x/5)
# Divide value for each day by 52.1429 (number of weeks in a year) to get normalized number of burglaries by day of week
burglaries_by_day = burglaries_by_day.apply(lambda x: x/52.1429)
burglaries_by_day
Out[129]:
0    36.165998
1    35.279971
2    35.475587
3    35.126546
4    37.454764
5    30.504632
6    27.221347
Name: day, dtype: float64
In [130]:
figure(num=None, figsize=(14, 8))
x = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
y = burglaries_by_day
plt.bar(x,y)
plt.title("Burglaries by Day of the Week")
plt.ylabel("Number of Burglaries")
plt.show()

There seem to be about 35 burglaries per weekday in Chicago, while the number is lower on weekends. This could be attributed to the fact that most people are at work on weekdays, and empty houses make better targets for burglars. Weekends seem to be less suitable days for burglaries, as most families stay at home.

Now let's dive a step deeper, and plot the number of burglaries by time of the day.

In [131]:
# Total number of burglaries for 5 years, grouped by hour in the day
burglaries_by_time = burglary_table['time'].value_counts().sort_index()
# Divide value for each hour by 5 to get number of burglaries for each year per hour
burglaries_by_time = burglaries_by_time.apply(lambda x: x/5)
# Divide value for each day by 8760 (number of hours in a year) to get normalized number of burglaries by time
burglaries_by_time = burglaries_by_time.apply(lambda x: x/8760)
burglaries_by_time
Out[131]:
0     0.056233
1     0.035913
2     0.035228
3     0.037329
4     0.035571
5     0.038836
6     0.045297
7     0.068311
8     0.083174
9     0.083059
10    0.073539
11    0.065251
12    0.084384
13    0.062945
14    0.069749
15    0.071963
16    0.064886
17    0.071210
18    0.066872
19    0.059132
20    0.053447
21    0.052100
22    0.054749
23    0.042900
Name: time, dtype: float64
In [132]:
burglaries_by_time.plot(kind='bar', rot=0, title="Burglaries by Time of the Day", figsize=(10, 8))
plt.ylabel("Number of Burglaries")
plt.show()

It might be expected that most burglaries occur at nightime. However, according to the above bar plot, most burglaries in Chicago occur around 8am, 9am, and 12pm. In fact, almost one burglary occurs at these times every day. Burglaries are least likely to occur from 1am to 6am. A possible reason for this trend could be the same as above. There seems to be an increase in the number of burglaries at the times when most people leave home for work. It looks like empty homes are preferred by burglars.

These observations seem interesting. Let's also visualize them in a line plot.

In [133]:
burglaries_by_time.plot(rot=0, title="Burglaries by Time of the Day", figsize=(10, 8))
plt.ylabel("Number of Burglaries")
plt.show()

This line plot confirms the observations from our bar plot and shows the big difference in burglary occurrence between different times of the day.

Now that we have determined when burglaries are most likely to occur, let's observe where they are most likely to occur.

We will do this by creating an interactive heat map indicating the areas of Chicago with the highest concentration of burglaries.

The burglary table contains a very large number of datapoints, which would make our heatmap ugly and unreadable. To improve readability and accuracy, we will be using a random sample of size 10,000.

In [134]:
# Take random sample of 10,000 rows
sample_table = burglary_table.sample(n=10000)
# Display first 5 rows
sample_table.head()
Out[134]:
id primary_type description arrest location latitude longitude date year month day time
272605 10378071 BURGLARY FORCIBLE ENTRY False {'latitude': '41.943475071', 'longitude': '-87... 41.943475071 -87.742038282 2016-01-13T11:00:00.000 2016 1 2 11
750983 11124585 BURGLARY UNLAWFUL ENTRY False {'latitude': '41.793030254', 'longitude': '-87... 41.793030254 -87.695002387 2017-10-19T20:30:00.000 2017 10 3 20
807975 11199994 BURGLARY ATTEMPT FORCIBLE ENTRY False {'latitude': '41.936680837', 'longitude': '-87... 41.936680837 -87.771643445 2018-01-09T13:50:00.000 2018 1 1 13
933309 11367756 BURGLARY FORCIBLE ENTRY False {'latitude': '41.73641031', 'longitude': '-87.... 41.73641031 -87.702044794 2018-07-02T19:00:00.000 2018 7 0 19
680342 11029185 BURGLARY FORCIBLE ENTRY False {'latitude': '41.783311684', 'longitude': '-87... 41.783311684 -87.609645926 2017-07-20T12:00:00.000 2017 7 3 12

To map our sample, we will be using the folium package

In [135]:
# Create map
map_osm = folium.Map(location=[41.88, -87.63], zoom_start=11)
# Drop rows where location is missing
heat_table = sample_table[sample_table['location'].notna()]
# Get heat data from sample
heat_data = [[row['latitude'], row['longitude']] for index, row in heat_table.iterrows()]
# Create heat map
HeatMap(heat_data, radius=20).add_to(map_osm)
    
map_osm
Out[135]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Now let's make our map a bit more descriptive, by adding some more data.

We will be creating circles, indicating the location of each burglary. By clicking on the circles, one will be able to see the incident description. Additionally, green circles will indicate burglaries where the offender has been arrested, while black circles will mean that the offender has not been arrested.

In [136]:
# Add circles
for index, row in heat_table.iterrows():
    color=''
    if row['arrest'] == True:
        color = 'green'
    else:
        color = 'black'
        
    folium.Circle(
    radius = 20,
    location = [row['latitude'], row['longitude']],
    popup = row['description'],
    color = color,
    fill = True,
).add_to(map_osm)
    
map_osm
Out[136]:
Make this Notebook Trusted to load map: File -> Trust Notebook

A considerably large proportion of the circles on the map above are black. This means that most burglars never get arrested. Lets visualize this in a pie chart.

In [137]:
burglary_table['arrest'].value_counts().plot(kind='pie', figsize=(10, 10), title="Burglars Arrested", autopct='%1.1f%%')
plt.ylabel("")
plt.show()

From the plot above, we can see that a surprisingly low percentage of burglars get arrested. 94.8% of them never get caught.

Now let's plot some other interesting statistics, using our complementary datasets.

We will use pie charts to visualize the age, sex, and race distributions of burglary offenders and victims.

In [151]:
offender_age[:7].plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Burglars by Age")
plt.ylabel("")
plt.show()
In [139]:
offender_sex.plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Burglars by Sex")
plt.ylabel("")
plt.show()
In [140]:
offender_race.plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Burglars by Race")
plt.ylabel("")
plt.show()
In [152]:
victim_age[:10].plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Victims by Age")
plt.ylabel("")
plt.show()
In [142]:
victim_sex.plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Victims by Sex")
plt.ylabel("")
plt.show()
In [153]:
victim_race[:5].plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Victims by Race")
plt.ylabel("")
plt.show()
In [155]:
victim_offender_relationship[:12].plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Relationship between Victim and Offender", subplots=True)
plt.ylabel("")
plt.show()
In [157]:
linked_offenses[:15].plot(kind='pie', y='Value', figsize=(10,10), autopct='%1.1f%%', title="Offenders linked to other offenses", subplots=True)
plt.ylabel("")
plt.show()

4. Insight & Observations

For the last stage of the data lifecycle, we will be utilizing the analysis we conducted to derive some insights and observations about burglaries in Chicago.

The number of burglaries seems to be decreasing every year as of 2016, meaning that Chicago is becoming a safer place to live.

Most burglaries occur in the summer months, on weekdays, between 8am and 12pm. Our analysis of burglaries by month, day, and time seem to aggree with each other and all confirm the same assumption: Burglars prefer vacant homes, where the chance of confrontation is decreased.

The highest concentration of burglaries seems to be in the center of the city. Other than that, there does not look to be any other obvious trend. An educated assumption would be that wealthier, less-secure households have a higher chance of being burglarized.

The majority of burglars are white males between the ages 20-29 and around 95% of them never get arrested.

A considerable amount of burglary victims seem to know the burglar in some way. Only 20% of burglary victims have reported the burglar as a complete stranger.